Conversation
b7bbf03 to
1bccd37
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive pooling operations support to the ckTileEngine, implementing 2D and 3D pooling with configurable reduction operations (max, min, average). The implementation follows the existing GEMM module architecture with individual kernel compilation for improved build parallelism and testing capabilities.
Key Changes:
- Added pooling kernel generation system with Python-based instance builder
- Implemented benchmark infrastructure for pooling operations
- Created configuration system for kernel parameters and GPU target selection
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| tile_engine/ops/pooling/pool_profiler.hpp | Profiler class for benchmarking pooling kernels with verification and performance metrics |
| tile_engine/ops/pooling/pool_instance_builder.py | Python script to generate C++ kernel instances from JSON configurations |
| tile_engine/ops/pooling/pool_common.hpp | Common type traits and data type definitions for pooling operations |
| tile_engine/ops/pooling/pool_benchmark_single.cpp | Single kernel benchmark executable with argument parsing and verification |
| tile_engine/ops/pooling/pool_benchmark.py | Python orchestration script for running and analyzing benchmark sweeps |
| tile_engine/ops/pooling/pool_benchmark.hpp | Core data structures and enums for pooling benchmarks |
| tile_engine/ops/pooling/configs/default_config.json | Default configuration for block sizes, warp configs, and traits |
| tile_engine/ops/pooling/README.md | Comprehensive documentation for building, running, and configuring pooling benchmarks |
| tile_engine/ops/pooling/CMakeLists.txt | CMake build system for generating and compiling individual pooling kernels |
| tile_engine/ops/gemm/README.md | Added missing gpu_target argument to example command |
| tile_engine/ops/commons/test_benchmark.sh | Extended test script to support both GEMM and pooling benchmarks |
| tile_engine/ops/CMakeLists.txt | Added pooling subdirectory to build system |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Sz": stride_z, | ||
| "Sy": stride_y, | ||
| "Sx": stride_x, | ||
| "pool_dim": pool_dim, |
There was a problem hiding this comment.
The parameter 'pool_dim' is included in the params dictionary but is not a valid command-line parameter for the benchmark executables according to pool_benchmark_single.cpp (which doesn't define a 'pool_dim' argument). This will be silently ignored or could cause errors. The pool dimension is determined by the kernel itself (POOL_DIM constant), not passed as an argument.
| "pool_dim": pool_dim, |
There was a problem hiding this comment.
I have a general question about the instance builder. In the GEMM tile engine, there’s a gemm_validation_utils.py module that is used by gemm_instance_builder to generate only valid instances that can later be compiled. However, it seems that for pooling, there isn’t a similar validation step. and, some checks are performed directly within pool_instance_builder. Would it make sense to adopt a similar structure for these tile engine operations to keep things same?
|
Error importing due to merge conflicts – please reopen the PR on ROCm/rocm-libraries |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| string(REPLACE "_" ";" trait_parts ${trait}) | ||
| list(GET trait_parts 0 output_index) | ||
| list(GET trait_parts 1 propagate_nan) | ||
| list(GET trait_parts 2 pool_dim) |
There was a problem hiding this comment.
We have both the pool_dim in command line and the json input. Could we unify that and only keep 1? I suggest we only keep it in json.
|
Applied review changes in rocm-libraries repo ROCm/rocm-libraries#4469 |
Proposed changes
Add pooling to ckTileEngine
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed files